SURFACE COMPUTER AND COMPUTING METHOD USING THE SAME 
CROSS-REFERENCE TO RELATED APPLICATION 

[0001] This application is a continuation of Application No. 
09/703,071 filed on October 31, 2000, the disclosure of which 
is hereby incorporated by reference herein. 
BACKGROUND OF THE INVENTION 

[0002] The present invention relates to a surface computer 
and a computing method capable of performing various types of 
complex computations at high speed, such as physical 
computations, environmental computations, behavior 

computations, emotion computations and the like, by 
concurrently computing computation data contained in a two- 
dimensional region in units of two-dimensional regions. 
[0003] Recently, various fields of natural science and 
engineering have remarkably progressed. In natural science 
and engineering, large-scale physical computations are 
required. For example, large-scale matrix computations must 
be performed in the fields of space development projects, 
fluid dynamics, and quantum mechanics. When such computations 
are desired to be performed at high speed, a computer must be 
optimized . 

[0004] Conventional computers, particularly personal 
computers, have progressed enough to exceed outdated general - 
purpose computers (so-called "mainframes"). However, it is 
difficult for such personal computers to perform the above- 
described large-scale computations at high speed, or the 
personal computers take a long time to carry out the above 
computations. Possible reasons which prevent the personal 
computers from performing the large-scale computations at high 
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speed are delays in data transfer speed, data processing speed, 
and the like which occur in the computers. 

[0005] Therefore, among developers, researchers, and the 
like who must perform the above large-scale physical 
computations, computers which can perform large-scale 
computations at high speed have long been desired. 
SUMMARY OF THE INVENTION 

[0006] Accordingly, it is an object of the present invention 
to provide a computer having a novel architecture and a 
computing method using the same capable of performing large- 
scale computation at high speed. 

[0007] To this end, according to a first aspect of the present 
invention, there is provided a surface computer including an 
address generator for generating an address for adjusting 
surface region data concerning at least a storage region and a 
concurrent computer, provided at a subsequent stage of the 
address generator, having a plurality of unit computers. 
[0008] According to a second aspect of the present invention, 
a surface computer includes an address generator for 
generating an address for adjusting surface region data 
concerning at least a storage region, a concurrent computer, 
provided at a subsequent stage of the address generator, 
having a plurality of unit computers and a storage unit 
connected to the concurrent computer. 

[0009] According to a third aspect of the present invention, 
a surface computer includes an address generator for 
generating an address for adjusting surface region data 
concerning at least a storage region, and a concurrent 
computer, provided at a subsequent stage of the address 
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generator, having a plurality of unit computers, wherein the 
region specified by an operand constituting an instruction 
word is a line. 

[0010] According to a fourth aspect of the present invention, 
a surface computer includes an address generator for 
generating an address for adjusting surface region data 
concerning at least a storage region, and a concurrent 
computer, provided at a subsequent stage of the address 
generator, having a plurality of unit computers, wherein the 
region specified by an operand constituting an instruction 
word is a surface region extending two-dimensionally . 
[0011] According to a fifth aspect of the present invention, 
a surface computer includes a data bus, having a large bus 
width, allowing a processing block and a storage block formed 
in one chip to be connected therebetween. In the surface 
computer, the processing block includes an address generator 
for generating an address for adjusting surface region data 
concerning a storage region and a concurrent computer, 
provided at a subsequent stage of the address generator, 
having a plurality of unit computers, and the storage block 
includes DRAM. 

[0012] According to a sixth aspect of the present invention, 
in a surface computer includes an address generator, a 
processing block having a concurrent computer comprising a 
plurality of unit computers, a storage block, a data bus 
having a large bus width and allowing the processing block and 
the storage block to be connected therebetween, the computing 
method includes an address generating step for causing the 
address generator to generate an address for adjusting surface 

3 



region data concerning a storage region, and a processing step 
for causing the concurrent computer to process the surface 
region data. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0013] Fig. 1 is a block diagram of the construction of a 

surface computer according to the present invention; 
[0014] Fig. 2 is a flowchart illustrating operations of the 

surface computer shown in Fig. 1; 

[0015] Figs. 3A and 3B are diagrams illustrating enlarging 

processing of a surface region; 

[0016] Figs. 4A and 4B are diagrams illustrating reducing 

processing of the surface region; 

[0017] Figs. 5A and 5B are diagrams illustrating inner 

interpolation processing; 

[0018] Fig. 6 is a diagram illustrating pattern matching 
processing; 

[0019] Figs. 7A, 7B, and 7C are diagrams illustrating the 

differences between conventional SISD-type and SIMD-type 
computers and the surface computer; 

[0020] Fig. 8 is a diagram illustrating computing 

processing when matrix computation is performed using the 
SISD-type computer; 

[0021] Fig. 9 is a diagram illustrating computing 

processing when matrix computation is performed using the 
S IMD - type compute r ; 

[0022] Fig. 10 is a diagram illustrating computing 
processing when matrix computation is performed using the 
surface computer; 

[0023] Fig. 11 is an illustration of computation processing 



in the surface computer; 

[0024] Fig. 12 is a block diagram showing another 

construction of the important part of the surface computer; 
[0025] Fig. 13 is a diagram showing an instruction word; 
[0026] Fig. 14 is a diagram illustrating an operand example 
and an interpolating operation of the surface computer; 
[0027] Fig. 15 is a diagram illustrating concurrent 

processing of a processor array; 

[0028] Fig. 16 is a diagram illustrating flow of an 

instruction and data in a construction in which the surface 
computer is used as a coprocessor; 

[0029] Fig. 17 is a diagram illustrating condition branch 
processing of the surface computer; 

[0030] Fig. 18 is a diagram illustrating a direct operand 
of the surface computer; 

[0031] Figs. 19A, 19B, 19C, and 19D are diagrams 

illustrating computing processing using two operands, three 
operands, and four operands, respectively; 

[0032] Fig. 20 is a diagram illustrating one example 
computation of the surface computer; and 

[0033] Fig. 21 is a diagram illustrating another example 

computation of the surface computer. 
DETAILED DESCRIPTION 

[0034] Embodiments of a surface computer and a computing 

method using the same according to the present invention are 
described with reference to the attached drawings. When 
elements are identical, they have the same reference numeral 
and a repeated description thereof is omitted. 
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[Design Ideas of Surface Computer] 

[0035] Fig. 1 shows the overall construction of the surface 

computer according to the present invention. The basic design 
of the surface computer is based on the following design ideas 

(1) A processing block and a memory block are formed on 
one chip using a semiconductor manufacturing method, 
which is used in ASIC-DRAM (Application Specified 
Integrated Circuit -Dynamic Random Access Memory) . 

[0036] In the past, in typical processors, the processing 

block and the memory block have been discretely formed. 
Therefore, memory access from the processing block to the 
memory block during processing was a performance bottleneck. 
However, improvements in recent semiconductor manufacturing 
technology led to an embedded or mixed LSI (Large -Scale 
Integration) in which the processing block and the memory 
block are integrated on one chip. Since the distance between 
the processing block and the memory block is shortened by 
integrating both blocks in the one chip, the speed of the data 
transfer can be accelerated, which solves the foregoing memory 
access problem. 

[0037] ASIC originally meant a special -purpose semi- 
customized integrated circuit. However, recently ASIC-DRAM 
primarily means a DRAM/logic integrated semiconductor element, 
which is in great demand as a custom integrated circuit, or it 
means the manufacturing process therefor. 

(2) An internal data bus having a large data-bus width 
allows the processing block and the memory block to be 
interconnected therebetween . 

[0038] On the basis of experience, when physical 

6 



computations are performed at high speed, the data transfer 
speed between the memory block and the processing block is 
expected to be a performance bottleneck more frequently than 
the processing speed of the processing block. To avoid the 
bottleneck due to the data transfer speed, the data bus width 
of the data bus that is used for data transfer between the 
processing block and the memory block is designed to be large. 

(3) The processing block uses a construction which 
enables surface region data to be handled as data so that 
a certain quantity of data may be handled at a time. 
[003 9] The concept of "surface region data" means a set of 

computation data which exists in a surface region. 
Accordingly, the concept of "surface computer" is a computer 
that specifies surface region data based on a surface or a 
surface region, manipulates it (i.e., reads data from memory 
and writes data to it) and performs computing processing. 
Since the surface computer processes surface region data, 
which is a group of data having a certain large quantity, 
processed data itself has the concept of concurrence. That is, 
the data has a structure capable of being concurrently 
processed. 

[0040] When the surface computer is developed on the basis 
of the above design ideas, the processing speed of the 
processing block is expected to relatively be a performance 
bottleneck by constructing the data bus between the processing 
block and the memory block so that the width of the bus is 
sufficiently large. Therefore, in order to avoid the problem 
due to the processing speed of the processing block, by using 
data to be processed as surface region data, a large number of 
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data groups is transferred once or several times and is 
processed in units of data groups. By setting the data bus 
width of the surface computer to be large, transfer of the 
surface region data can be performed. 

[0041] The surface computer according to the present 
invention includes a processor array having a plurality of 
unit computers in the processing block so that a certain large 
quantity of data can be efficiently processed. 

[0042] Hereinafter, the concrete construction of the 
surface computer based on the above design ideas is described. 

[Construction of Surface Computer and Operations of Each 
Component Thereof] 

(Construction of Surface Computer) 

[0043] Fig. 1 shows the overall construction of the surface 
computer according to the present invention. The surface 
computer shown in Fig. 1 is roughly divided into a processing 
block P and a memory block M. As described above, a 
conventional processing block and a conventional memory block 
have been discretely formed. However, this surface computer 
has the processing block P formed in a manner which is 
dedicated to the DRAM and which takes advantage of the ASIC- 
DRAM. Thus, the surface computer is constructed having the 
processing block P and the memory block M formed on one-chip. 

[0044] The processing block P includes an IFU (Instruction 

Fetch Unit) 2, a PP (Preprocessor) 9, an AG (Address 
Generator) 3, and a PA (Processor Array) 4. 

[0045] The memory block M includes a DRAM 1. 

[0046] Other than the processing block P and the memory 
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block M, the surface computer includes an IFDTC (Instruction 
Fetch and Data Transfer Controller) 5 and a BUS I/F (BUS 
Interface) 6. 

[0047] This surface computer is implemented in a one-chip 

LSI having the above -described functions incorporated therein. 

[0048] This surface computer is connected to a main memory 
7 and a central processing unit 8, both of which exist outside 
the surface computer. Although the external processing unit 
serves as the CPU in this embodiment, use of the surface 
computer is not necessarily limited to a local processing unit 
The surface computer may serve as the central processing unit. 

[0049] The IFDTC 5 is a control device for, under the 
control of the CPU 8, controlling fetching operations of 
surface computing instructions to be executed by the surface 
computer and transferring operations of surface region data to 
be processed by the surface computer. The operation of the 
IFDTC 5 allows the processing block P of the surface computer 
to serve as a coprocessor of the CPU 8, which reduces the load 
on the CPU 8 . 

[0050] The IFU 2 fetches a list of surface computing 

instructions executed on the surface computer from the main 
memory 7 under the control of the IFDTC 5. 

[0051] The PP 9 computes the surface region corresponding 
to an operand of a surface computing instruction word and 
issues an instruction that computation data corresponding to 
this surface region is transferred from the DRAM 1 to primary 
and secondary sense-amplifiers 10 and 11 (buffers) which are 
included in the DRAM 1. The primary sense-amplifier 10 and 
the secondary sense-amplifier 11 serve as a cache memory. 
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Furthermore, the PP 9 sets up the AG 3 in accordance with the 
operand. In addition, the PP 9 decodes general -purpose 
surface computing instructions which are called from the main 
memory 7 and which are fetched via the IFU 2. 

[0052] The AG 3 generates an address for providing data 
fetched in the sense-amplifiers 10 and 11 to the PA 4, 
computes location dependence information within the surface 
regions, and outputs the resultant location dependence 
information to the PA 4. Computation of location dependence 
information means, for example, address computation of surface 
region data when the surface region sizes which determine the 
corresponding surface region data (objects to be computed) 
disagree and when an address of one surface region having data 
thereat corresponds to the address of the other surface region 
having no data thereat. Enlarging/ reducing processing, 

interpolation processing such as bilinear interpolation 
processing or tri-linear interpolation processing, and the 
like for computation of location dependence information are 
described below. 

[0053] Since the surface computer handles the surface 
region data, the structures of the operands of instruction 
words which specify the surface region data are different from 
those of conventional computers. Details of these operands 
are also described below. 

[0054] The PA 4 computes the surface region data. In order 

to achieve high-speed computations, the PA 4 is constructed as 
a processor array including, for example, 16 unit computers 4a. 
However, the number of the unit computers 4a of the PA 4 is 
not necessarily limited to 16. Moreover, in order to achieve 
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much faster computations, the PA 4 uses an architecture such 
as superscalar architecture or super pipeline architecture. 
Superscalar and super pipeline architectures are also 
described below. 

[0055] The PA 4 processes the computation surface region 
data in each operand supplied from the sense-amplifiers 10 and 
11 based on an address code generated by the AG 3 in 
accordance with an operation code. 

[0056] In addition, the PA 4 includes an MEM IF (memory- 

interface) 12 for communicating with the memory block M. The 
MEMIF 12 communicates data with an external memory, such as 
the main memory 7, or it operates the primary sense-amplifier 
10 which performs writing processing such as destination 
operand region processing in which read-modif y-write is 
performed . 

[0057] Typically, the memory block M includes the DRAM 1. 

The DRAM 1 is an internal memory which serves as a work area 
for the surface computer in contrast with the main memory 7, 
which is outside of the surface computer. The storage 
capacity of the DRAM 1 is, for example, 32 Mbits. 

[0058] The DRAM 1 includes the primary sense-amplifier 10 

(or a buffer I) which serves as cache memory for transferring 
data between the MEMIF 12 and the DRAM 1 and the secondary 
sense-amplifier 12 (or a buffer II) which serves as cache 
memory for transferring data between the PA 4 and the DRAM 1. 

[0059] The BUS I/F 6 outputs a list of general -purpose 
surface computing instructions stored in the main memory 7 to 
the IFU 2 and outputs the surface region data to the MEMIF 12. 

[0060] As described above, the processing block P and the 
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memory block M are interconnected with the data bus having a 
large data bus width. Specifically, data is transmitted from 
the MEM IF 12 of the PA 4 to the primary sense-amplifier 10 via 
an internal data bus for writing 13 having 1024 bits of bus 
width, while data is transmitted from the primary sense- 
amplifier 10 of the DRAM 10 to the MEM IF 12 of the PA 4 via an 
internal data bus for reading 14 having 1024 bits of bus width. 
The data bus 14 is used primarily for surface region data 
transfer . 

[0061] Furthermore, when the PA 4 includes, for example, 16 
unit computers 4a each having a 3 2 bit data bus, the secondary 
sense-amplifier 11 of the DRAM 1 and the PA 4 are 
interconnected via a data bus 15 having a 512 bit data bus 
width {= 16 x 32 bits) . The data bus 15 is used for reading 
operand information and the like which specify the surface 
region data. 

[0062] The PA 4 and the DRAM 1 are directly interconnected 
using these three data buses 13, 14, and 15 having a total 
data base width as large as 2048 bits. This prevents data 
transfer between the memory block M and the processing block P 
from being the performance bottleneck. Transfer of surface 
region data including quantities of data is realized by taking 
advantage of this feature. A 2048 bit data bus width is only 
an example, and the present invention is not limited to the 
size of the data bus width. 

[0063] The CPU 8 (outside the surface computer) outputs 

instructions to the IFDTC 5 and outputs a list of general- 
purpose surface computing instructions (corresponding to a 
program) and surface region data via the BUS I/F 6 to the IFU 
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2 and the MEM IF 12. The list of general -purpose surface 
computing instructions and the surface region data which are 
used in this surface computer are pre- stored in the main 
memory 7 . 

(Operations of Each Component of Surface Computer) 
[0064] Fig. 2 shows a flowchart illustrating each component 
of the surface computer having the above-described 
construction. 

[0065] With reference to Fig. 1, the CPU 8 controls the 

overall operations of the IFDTC 5. 

[0066] At step S10, under the control of the IFDTC 5, the 
IFU 2 fetches the surface computing instruction output from 
the main memory 7 via the BUS I/F 6 and transfers the 
instruction to the PP 9. 

[0067] At step S20, when the PP 9 receives the surface 
computing instruction, the surface region corresponding to an 
operand in an instruction word concerning this surface 
computing instruction is preprocessed (computed) . 

Subsequently, the PP 9 outputs an instruction so that the 
surface region data specified based on the resultant 
computation is transferred from the DRAM 1 to the primary 
sense-amplifier 10 and/or the secondary sense-amplifier 11. 
That is, the computation surface region data is pre-f etched in 
the primary sense-amplifier 10 and the secondary sense- 
amplifier 11. In addition, the PP 9 sets up for the AG 3 (a 
subsequent stage of the PP 9) in accordance with surface 
region information in the operand of the surface computing 
instruction . 

[0068] At step S30, the AG 3 generates an address obtained 
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by performing address computation for providing, to the PA 4, 
the surface region data fetched in the primary sense-amplifier 
10 and/or the secondary sense-amplifier 11. In addition, 
location dependence information in the surface region is 
computed. Subsequently, the AG 3 computes the address data 
and the location dependence information, and supplies the 
resultant data to the PA 4. This location dependence 
information is, for example, the address of the surface region 
data and, as described below, it contains information 
concerning whether an address of surface region data has a 
counterpart in the other surface region data. 

[0069] At step S40, in accordance with the operation codes 
of the surface computing instruction, the PA (processor array) 
4 processes computation surface region data in operands 
supplied from the primary sense-amplifier 10 and the secondary 
sense-amplifier 11 based on address data generated from the AG 
3 . When data has no counterpart (which means that the data 
has no corresponding address in the other surface region) due 
to a difference of operand information specifying surface 
region data, the adjacent data output from the sense - 
amplifiers 10 and 11 (buffers) are processed in accordance 
with the operation code by means of interpolation processing, 
such as bilinear interpolation processing, or tri- linear 
interpolation processing, so that data is generated having the 
correspondent in the other surface region. 

[0070] At step S50, the MEM IF 12 exchanges data with the 
main memory 7 via the BUS I/F 6. In addition, the MEM IF 12 
operates the primary sense-amplifier 10 which writes data to 
the DRAM 1, whereby, for example, destination operand region 
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processing and the like for executing read-modif y-write 
operation (executing a reading operation and then a writing 
operation in one cycle) are performed. 

[Address Generation] 

[0071] Address generation step (step S30) is further 
described. When the PA 4 processes the surface region data in 
accordance with the operation code of surface computing 
instruction, there is sometimes no correspondence between both 
sets of surface region data due to the difference of operand 
information specifying each set of surface region data. 
[0072] For example, no correspondence occurs when a pair of 
computation surface region data does not match, which means 
that the sizes of the surface region data are different. In 
this case, for example, by appropriately enlarging or reducing 
the surface region specified by the source operand so that the 
surface region specified by the source operand corresponds to 
the surface region specified by the destination operand, 
computation of both surface region data can be performed. 
This enlarging/reducing processing typically is performed by 
means of copy processing or interpolation processing. 
[0073] In this manner, when the data size of each surface 

region is determined by causing the surface region specified 
by the source operand to correspond to the surface region 
specified by the destination operand, for example, the size of 
the surface region data is enlarged or reduced, or the sizes 
of the surface region data are processed so as to match a 
predetermined scale ratio of the surface region data. 
[0074] There is a case in which no correspondence occurs 
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between the addresses of parts of or the entirety of 
computation data which constitute the corresponding surface 
regions. In this case, by performing interpolation processing 
on one of or both of the surface regions, data is generated at 
the corresponding address thereof . 

[0075] Even between surface regions having no corresponding 
computation data due to differences in operand information, 
such interpolation processing enables computation data in one 
surface region to have the counterpart data at the 
correspondence address of the other surface region. 
Accordingly, processing can be performed between an arbitrary 
pair of surface region data. As interpolation processing, 
typically bilinear filtering, tri-linear filtering, or the 
like is performed using adjacent data included in the surface 
region data. 

(Enlarging Processing of Surface Region) 
[0076] Figs. 3A and 3B show enlarging processing of the 
surface region data. In the example shown in Fig. 3A, surface 
region data "a" is enlarged by duplicating the surface region 
enlargement data "a" and using a plurality thereof. In this 
case, a desired number of copying steps are performed on 
surface region data which has the same area as that of the 
surface region data "a" and which is not enlarged, whereby 
enlarging processing can be performed. 

[0077] Fig. 3B shows another enlarging processing by means 
of interpolation processing. Four small surface regions "a", 
"b", "c", and "d" of the surface region data experiencing no 
enlargement are arranged as shown in Fig. 3B. When the 
surface region data is enlarged, each of the small surface 
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regions is arranged in the corresponding corner. Adjacent 
small surface regions having areas of (a + b)/2, (a + c) / 2, 

(b + d) / 2, and (c + d) / 2 are arranged, respectively, between 
"a" and "b", between "a" and "c" , between "b" and "d", and 
between "c" and "d" . A part enclosed by regions "a", "b", "c", 
and "d" has surface region data having an area of (a + b + c + 
d) / 4. Thus, the surface region data consisting of small 
surface regions "a", "b", "c", and "d" are enlarged. 
(Reducing Processing of Surface Region) 

[0078] Figs. 4A and 4B show reducing processing. Reducing 

processing is performed in the reverse direction of the 
enlarging processing shown in Fig. 3A or 3B. In Fig. 4A, 
reducing processing is performed in the reverse direction of 
enlarging processing by means of the copying shown in Fig. 3A. 
In Fig. 4B, reducing processing is performed in the reverse 
direction of enlarging processing by means of interpolation as 
shown in Fig. 3B. 

(Bilinear Processing) 

[0079] Fig. 5A shows a diagram illustrating an example in 

which data is generated by means of bilinear interpolation as 
one example of interpolation processing. In one computation 
surface region data SD, the address represented by "O" in Fig. 
5A has data stored thereat while, in the other computation 
surface region data SD, the address represented by "O" has no 
data stored thereat as shown in Fig. 5A. In this case, data 
is generated by means of inner interpolation processing using 
four proximal data of the address which is represented by "O" 
and which has no data thereat. That is, linear interpolation 
is performed in the oblique direction to the upper-right (for 
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example, in the direction of the x-axis) (observed from the 
figure) and linear interpolation is vertically performed (for 
example, in the direction of the y-axis) . Thus, such an 
interpolation method is called a bilinear interpolation method. 
[0080] Even though surface region data SD discretely exists, 
for example, the surface region data SD exists only at 
addresses corresponding to intersections of positive grids as 
shown in Fig. 5A, a group of solid surface region data 
constituting this surface region can be obtained by means of 
bilinear processing. 

(Tri-linear Processing) 
[0081] Fig. 5B shows a diagram illustrating an example in 

which data is generated by means of tri-linear interpolation. 
One computation surface region data SD is assumed to have no 
data at the corresponding address of the other surface region. 
In this case, as shown in Fig. 5B, surface region data is 
generated by means of interpolation processing using two 
surface region data SD which are in proximity to both sides of 
the corresponding address of the other surface region. That 
is, data is generated by means of the bilinear interpolation 
method at the address corresponding to surface region data SD 
to the left of the surface address having no data thereat, 
observed in Fig. 5B. Likewise, data is generated by means of 
the bilinear interpolation method at the address corresponding 
to surface region data SD to the right of the surface address, 
observed in the figure. By linearly interpolating these two 
generated data, the data having the address which is included 
in the other surface region and which is represented by "O" is 
obtained. Such an interpolation method is called a tri-linear 
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interpolation method. 

[0082] Even though a plurality of surface region data SD 
discretely exist, a plurality of surface region data SD which 
constitute a surface mutually solidly specified can be 
obtained, for example, by means of a tri-linear method. That 
is, an arbitrary three-dimensional region can be established 
as an object to be computed for physical computation and the 
like . 

(Point Sampling) 

[0083] Point sampling is a method in which data is selected, 

based on a predetermined regularity, from among surface region 
data SD. Point sampling is used when the matrix computation 
described later is performed. 

(Mask Processing) 
[0084] Mask processing is a method in which data is 
discarded, based on a predetermined regularity, from among 
surface region data SD. 

(Pattern Matching) 
[0085] Fig. 6 shows pattern matching processing. Pattern 
matching finds the distance between patterns. Pattern 
matching is performed using bilinear interpolation or tri- 
linear interpolation. In this processing, the difference (SPd 
SPd - SPs) between the patterns is computed by surface 
computing processing and then the absolute value of the 
resultant difference (SPd = |SPd - SPs | ) or the square of the 
resultant difference (SPd = (SPd - SPs) * (SPd - SPs)) is 

computed. Reduction- transfer is performed by means of the 
bilinear interpolation to a destination surface having a half 
of the length of the source surface and the total of the 
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resultant differences is obtained. The reduction- transfer is 
recursively performed until a scalar value is obtained, 
whereby a final scalar value is obtained. 

[0086] Programs executed on the surface computer 
appropriately select from among the above -described enlarging 
processing, reducing processing, bilinear processing, tri- 
linear processing, point sampling processing, mask processing, 
pattern matching processing, and the like. 

[Features of a Surface Computer and Comparison between SISD- 
type and SIMD-type Computers and a Surface Computer] 

[0087] Features of the surface computer are described 
compared with a conventional SISD (Single Instruction Single 
Data stream) method and SIMD (Single Instruction Multi Data 
stream) method. 

[0088] Figs. 7A to 7C illustrate an SISD-type computer, an 
SIMD-type computer, and the surface computer from the 
viewpoint of an instruction stream and hardware mechanism. 

[0089] In the SISD-type computer shown in Fig. 7A, the unit 

computer 4a sequentially performs computations using single 
data (hereinafter referred to as point data) PD in accordance 
with a single instruction. Therefore, when large-scale 

computation is performed, a great amount of time is required, 
which makes the SISD method inappropriate for this type of use. 

[0090] In the SIMD-type computer shown in Fig. 7B, a 

plurality of unit computers 4a concurrently perform 
computations in accordance with a single instruction. Since 
the SIMD-type computer includes a plurality of unit computers, 
there is concurrence in an overall instruction executing 
sequence. To ensure the concurrence of the instruction 
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executing sequence, the number of the unit computers 4a is 
determined based on the number of parallel instruction 
executing sequences. 

[0091] When large-scale computation is performed, the SIMD- 
type computer performs much faster than the SISD-type computer. 
However, as described above, there are demands for a type of 
physical computation and the like requiring large quantities 
of data which cannot be handled even by the SIMD-type computer. 
[0092] The surface computer regards data as a surface. 
That is, the surface computer specifies a data group based on 
a surface region and processes it. Data which belongs to one 
data group is transferred and is processed as surface region 
data SD once (or several times) , whereby the number of 
repetitions of data transfer and computation is greatly 
decreased. Consequently, the surface computer achieves high 
performance in computation. 

[0093] From the viewpoint of a concept in which a group of 
data is treated at a time, as shown in Fig. 7C, the data may 
be line data LD which is specified by a line. Accordingly, 
the concept of the surface computer includes a line computer. 

[0094] Features of the surface computer are that data 
itself concurrently exist. The number of unit computers 4a is 
independent of the number of surface data SD, the number of 
instruction executing sequences, and the like. Since the 
surface computer computes the entirety of a surface region 
including a plurality of computation data, the computation 
data is concurrent. The computing method of the surface 
region data and the number of unit computers 4a required for 
the computing method are determined by the computer program 
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and are independent of the surface region data SD. The 
surface computer may sequentially perform computation itself. 
[0095] However, when large quantities of data are processed 
at high speed, it is obvious that an acceleration in the 
computations can be achieved by providing a plurality of unit 
computers. Therefore, in the present invention, 16 unit 
computers are provided in the PA 4 . As described below, when 
needed, superscalar or super pipeline architecture can be used 
as the architecture of the PA 4, whereby more acceleration in 
computation can be realized. 

[0096] The surface computer differs from the other two 
methods in that : 

(1) the SIMD-type computer has concurrence in the instruction 
executing sequence while the surface computer has concurrence 
in data; 

(2) the SISD-type and SIMD-type computers handle point data PD 
while the surface computer handles surface region data SD; 

(3) since the surface computer handles surface region data SD, 
it has a different structure in computer language from those 
of the SISD-type computer and the SIMD-type computer, both of 
which handle point data PD, that is, there is a difference, in 
the form of an operand constituting an instruction word, 
between the surface computer and the SISD-type and SIMD-type 
computers ; 

(4) the SIMD-type computer handles only discrete point data PD 
while the surface computer handles the entire data of a solid 
surface region using the above-described inner interpolation 
processing or the like (furthermore, the surface computer can 
handle arbitrary surface region data in three dimensions) ; and 
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(5) the number of unit computers 4a of the SIMD-type computer 
is dependent on the number of parallel instruction executing 
sequences, while the number of unit computers 4a of the 
surface computer is independent. 

[0097] Before describing these differences, processing of 
the surface computer is described compared with the SISD-type 
computer and the SIMD-type computer using an example of a 
typical matrix computation in which large quantities of data 
are processed. 

(Difference Between SISD-type and SIMD-type Computers and 
a Surface Computer with Respect to Matrix Computation) 
[0098] Figs. 8 to 10 illustrate processing of large 

quantities of matrix computations using the SISD-type computer, 
the SIMD-type computer, and the surface computer, respectively. 
A matrix computation described here is used, for example, for 
coordinate transformation with respect to three-dimensional 
image data. In this case, a column vector (x, y, z, w) 
constituting the image data is coordinate- transformed into a 
point (X, Y, Z, W) using each of the coefficient matrices 
shown in Figs. 8 to 10, respectively. 

[0099] When such matrix computation is performed using the 
SISD method illustrated in Fig. 7A, the computations in the 
box in Fig. 8 are sequentially performed. 

a 00 * x + a 01 * v + a 02 * z + a 03 * w = X 
[0100] In this case, for example, the computations are 
sequentially performed from the first term of the left-hand 
side. In the same manner, the following computations are 
sequentially performed: 

a 10 * x + a ll * v + a 12 * z + a 13 * w = Y 
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a 20 * x + a 21 ★ y + a 22 * z + a 23 * w = Z and 

a 30 * x + a 31 * y + a 32 * z + a 33 * w = W 
[0101] Since this sequence is repeated a number of times 

according to the size of the data, it takes substantial 
amounts of time to complete the above computations. In this 
example, elements of the matrix a x0 to a 33 are point data PD, 
and, in this SISD method, a 10 * x ' etc., are sequentially 
computed. 

[0102] Next, when, as shown in Fig. 9, the same matrix 
computation is performed using the SIMD-type computer 
illustrated in Fig. 7B, four unit computers 4a substantially 
concurrently compute the corresponding four expressions shown 
in the box in Fig. 9 in accordance with a single instruction. 
[0103] A first unit computer 4a computes the products of 
elements a 0 o/ a 0 i/ a 0 2/ and a 0 3 of the matrix and elements x, y, 
z, and w of a column vector, respectively, and then computes 
the sum of the resultant products. 

a 00 * x + a 01 * v + a 02 * z + a 03 * w = X 
[0104] A second unit computer 4a computes substantially 
concurrently . 

a 10 * x + a xl * y + a 12 * z + a 13 * w = Y 

[0105] A third unit computer 4a computes substantially 
concurrently . 

a 20 * x+a 21*y + a 22 * z+a 2 3* w= Z and 
[0106] A fourth unit computer 4a computes substantially 
concurrently . 

a 30 * x + a 31 * y + a 32 * z + a 33 * w = W 

[0107] Since these computations are performed substantially 
concurrently, the SIMD-type computer can process substantially 
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faster than the SISD-type computer. 

[0108] Fig. 10 illustrates a case in which the same matrix 
computation is performed using the surface computer according 
to the present invention. The columns of the four-dimension 
square matrix on the left-hand side of the expression shown in 
the upper part of Fig. 10 are treated as surface region data 
SD1, SD2 , SD3, and SD4 , respectively. The column vector on 
the right-hand side is treated as surface region data SDv. 
Those surface region data are input to the PA 4 in which 
necessary multiplications and additions are applied to each of 
the surface region data SD1 , SD2 , SD3 , and SD4 and the surface 
region data SDv. 

[0109] When the size of the surface region specified by the 
operand in each surface region data does not match, the above - 
described enlarging/reducing processing and the like cause the 
size of each surface region to be matched. In this case, no 
inner interpolation processing, but instead point sampling 
processing or mask processing is performed on each element of 
the matrices, whereby surface computations are carried out 
while the value of the coefficient of each element is held. 

[0110] When necessary computations are performed between 
each of the surface region data SD1, SD2 , SD3 , and SD4, and 
the surface region data SDv, specifically, four sets of the 
surface region data SD1 , SD2 , SD3 , and SD4, which amounts to 
16 surface regions, are provided along with four sets of 
surface region data SDv. Each product obtained by multiplying 
the surface region data SD1 and SDv, i.e., a 00 * x / a io * Y» 
a 2 Q * z / anc * a 30 * W; eac h product obtained by multiplying the 
surface region data SD2 and SDv, i.e., a 01 * x, a i:L * y, a 2 i * 
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z, and a 31 * w; each product obtained by multiplying the 
surface region data SD3 and SDv, i.e., a Q2 * x / a i2 * V* a 22 * 
z and a 32 * w; and each product obtained by multiplying the 
surface region data SD4 and SDv, i.e., a 0 3 * x, a 13 * y, a 23 * 
z, and a 33 * w are computed at the same time. Using the 
results of the above computations, the following computations 
are performed. 

a 00 * x + a 01 * y + a 02 * z + a 03 * w = X 
a 10 * x + a xl * y + a 12 * z + a 13 * w = Y 
a 20 * x + a 21 * y + a 22 * z + a 23 * w = Z and 

a 30 * x + a 3l * Y + a 32 * z + a 33 * w = w 
[0111] Since the surface computer according to the present 
invention transfers and computes such surface region data once 
or several times, large-scale matrix computations and the like 
can be performed at high speed. 

[Points of Difference between a Surface Computer and the Other 
Two Computers] 

[0112] The above-mentioned five differences will now be 

described. 

(First Point of Difference) 

[0113] First, the surface computer differs from the SIMD- 
type computer in that the surface computer handles data 
concurrently while the SIMD-type computer handles instruction- 
executing sequences concurrently. 

[0114] However, the surface computer does not deny the 

concept of the SIMD method. The surface computer can be 
constructed using the concept of the SIMD method. For example, 
by providing a plurality of surface computers, the 
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instruction-executing sequence is caused to have concurrence 
as the overall system, whereby the surface computer can be 
constructed using the SIMD method. 

(Second Point of Difference) 
[0115] Secondly, the surface computer differs from the 
SISD-type and SIMD- type computers in that the surface computer 
handles surface region data SD while the other two computers 
handle point data PD. The surface computer is constructed so 
as to be capable of transferring and computing two-dimensional 
surface region data SD (including line data LD) in units of 
regions . 

[0116] For example, when surface region data SPs and SPd 
are added as shown in Fig. 11, each of the surface region data 
SPs and SPd is transferred and computed at the same time. 
Accordingly, the surface computer can handle data faster than 
the SIMD-type computer which repeats sequential transferring 
and computation of point data PD. That is, since the surface 
computer transfers and computes data in the region specified 
by the operand in units of surface regions, the number of 
repetitions of transfer and computation is greatly decreased. 
Therefore, high-speed computation can be realized. 
(Alternatives of PA 4) 

[0117] When high-speed computation is desired, in addition 

to acceleration in data transfer, acceleration in computation 
processing is also important. In order to accelerate 

processing speed, concurrent processing by the concurrent 
computer, including 16 unit computers 4a shown in Fig. 1, is 
employed in the processor array according to the present 
invention. Other than the above concurrent computer, a high- 
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speed processor having the construction, for example, shown in 
Fig. 12 can be employed. The concurrent processor in Fig. 12 
employs the superscalar and super pipeline architectures. 
[0118] The superscalar architecture is an architecture 
which accelerates the processing speed by improving the 
multiplicity in the space domain of the pipeline by causing a 
plurality of instruction fetch/decode mechanisms and ALUs to 
concurrently operate. The super pipeline architecture is a 
method which improves the multiplicity in the time domain by 
deepening the pipeline. 

[0119] In this example, at least 32 PAs 4 are provided as 
shown in Fig. 12. Moreover, a pipeline having at least 32 
steps is provided in one PA 4 . Accordingly, in this example 
construction, at least 1024 instructions (= 32 steps x 32) can 
be concurrent ly execut ed . 

(Third Point of difference) 
[0120] Thirdly, since the surface computer handles surface 
region data SD, it has a different structure in computer 
language from those of the SISD-type computer and the SIMD- 
type computer, both of which handle point data PD. That is, 
there are differences in the form of an operand constituting 
an instruction word between the surface computer and the SISD- 
type and SIMD-type computers. 

(Difference in Operand Structure) 
[0121] The surface computer greatly differs from the 
conventional SISD-type and SIMD-type computers in that 
computation data in the surface region is specified and 
computed based on the surface region which is established as a 
unit. Accordingly, the surface computer has a different 
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structure in the computer language from those of the other 
types of computers. 

[0122] As shown in Fig. 13A, in the conventional two types 
of computers, the instruction word (a typical single 
instruction) includes an operation code 131 and at least two 
operands 132. The operand 132 serves to index or specify the 
address of various computation data. As shown in Fig. 13B, 
the operand 132 includes a source operand 133 for specifying 
the source address of data and a destination operand 134 for 
specifying the destination address of the data. 

[0123] In the two conventional computers, since data 
specified by each operand is individually a single data (a 
scalar value or a vector value, that is, point data) , during 
computing, processing is repeated in which the single data 

(point data) is sequentially called and processed. 

[0124] In the surface computer, the instruction word 
includes the operation code and two operands. The surface 
computer is the same as these two types of computers in that 
the operand includes the source operand and the destination 
operand. However, the operand of the surface computer 

represents a two-dimensional region, which means that the 
addressing mode represents a two-dimensional region. 
Accordingly, the surface computer differs from the 
conventional computers in that the surface computer specifies 
surface region data while the conventional computers specify 
point data . 

[0125] The operand of the surface computer specifies the 
surface region by normally specifying, for example, four data 
points which enclose the surface region. 
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[0126] The data specified by the operand in the instruction 
word of the surface computer are sometimes seemingly one- 
dimensional. For example, a certain region can be specified 
using a pointer (point) and a line L in which the point is 
established as a base point of the line L and the length of 
the entire data is established as the length of the line L. 
This can allow operands of the surface computer to be 
represented by the point and the line L. 

[0127] In addition, as described above, the surface 

computer includes a line computer. When a line computer is 
used, the operands which specify line data include, for 
example, two pointers. 

[0128] Thus, since the operands of the surface computer 
according to the present invention are different from those in 
conventional computers, there are differences between the 
surface computer and the conventional computers in the 
structures of the computer language systems thereof, that is, 
the sets of the instruction words thereof. 

[0129] The following representation examples can be denoted 
as example operands representing the region of the surface 
region data. 

(1) square-point type (denoted by "SP"); SP(X1, Yl , X2, Y2) 

(2) triangular type (denoted by " TR " ) ; TR(X1, Yl, X2 , Y2 , X3 , 
Y3) 

(3) triangular mesh type (denoted by " TRM " ) ; TRM(X1, Yl, X2 , 
Y2 . . . ,Xn, Yn) 

(4) line type (denoted by " LN" ) ; LN(X1, Yl , X2 , Y2) 
where X and Y represent coordinates (address) . 

[0130] The square point type SP (1) represents the 
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coordinates of a pair of vertices of a square surface region 
on a diagonal line thereof. 

[0131] The triangular type TR (2) represents the 
coordinates of the vertices of a triangular region. 
[0132] The triangular mesh type TRM (3) represents a set of 
specific points (selected based on a predetermined rule) in 
the triangular region. 

[0133] The line type LN (4) represents a line. 

[0134] The surface region can take an arbitrary form other 
than a rectangle or a triangle. Furthermore, in the square 
point type SP, the rectangular region may be represented with 
the coordinates of one vertex and the length of a side or a 
diagonal line passing through this vertex. 

(Relationship Between Operands Specifying Region and 
Concurrent Compu t at i on ) 

[0135] The relationship between the operands of the line 
computer and concurrent computation will now be described. 
Fig. 14 illustrates a case in which line regions having 
mutually different regions (the lengths in this case) are 
computed. Line region data 14 3 is obtained by reducing a 
region 141 specified by a line operand 1 while line region 
data 144 is obtained by enlarging a region 142 specified by a 
line operand 2, whereby the line region data 143 and 144 

(lengths) are matched. The PA 4 performs surface (line) 
computing processing. After computation, the resultant data 
is stored in a line region 145 specified by a line operand 3. 

[0136] When the sizes of the line regions are matched, in 
the same manner as in surface region computing, various 
methods exemplified in the right half of Fig. 14, such as 
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copying, linear interpolation, sampling, enlarging processing, 
and the like, are performed so that the size of the line 
region is appropriately changed. 

(Concurrent Processing of PA and Many Operands) 
[0137] Fig. 15 shows the concurrent processing of the PA 4 
and many operands. Computation data D3 and D4 stored in the 
primary sense-amplifier 10 and specified by the operands 3 and 
4 are input among a plurality of PAs 4, while computation data 
Dl and D2 stored in the secondary sense-amplifier 11 and 
specified by the operands 1 and 2 are input among a plurality 
of PAs 4 . The resultant data obtained by performing a desired 
operation on these data are stored in the primary sense 
amplifier 10. For example, when D3 = fl(Dl, D2 , D3 , D4) is 
computed, the resultant data is stored in the operand 3. When 
D4 = f2(Dl, D2, D3, D4) is computed, the resultant data is 
stored in the operand 4 . 

[0138] The fourth point of difference is that the surface 
computer can use the data of the entirety of the solid surface 
region by using inner interpolation processing or the like, 
while the SIMD-type computer can handle only discrete point 
data. This is already described along with enlarging/reducing 
processing of the surface region and inner interpolation 
processing . 

[0139] The fifth different point is that the number of unit 

computers 4a of the surface computer is independent while the 
number of unit computers 4a of the SIMD-type computer is 
dependent on the number of parallel instruction executing 
sequences. This is also already described when both methods 
are compared . 
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[Typical Instruction Example of Surface Computer] 

(Instruction and Data Flow) 
[0140] Instruction execution (control) of the surface 

computer will now be described. Fig. 16 shows instruction and 
data flow for the example construction shown in Fig. 1 in 
which the surface computer according to the present invention 
functions as a coprocessor of the CPU 8. 

[0141] This surface computer interconnects with the CPU 8 
and the main memory 7 via the BUS I/F 6 and the data bus. A 
general -purpose surface computing instruction list 161 and 
computation surface region data 162 are stored in the main 
memory 7 . 

[0142] Under the control of the CPU 8, the IFDTC 5 controls 
the general -purpose surface computing instruction list 161 so 
as to be directly transferred to the IFU 2 via the BUS I/F 6 
of the surface computer. Likewise, the IFDTC 5 controls the 
computation surface region data 162 so as to be directly 
transferred to the DRAM 2 via the BUS I/F 6. Thus, since the 
IFDTC 5 controls the instruction list 161 and the surface 
region data 162 to be directly transferred to the surface 
computer without aid from the CPU 8, the load on the CPU 8 is 
reduced. By having such a construction, not only two-operand 
or three-operand computation, but also four-operand 
computation, can be controlled. 

(Condition Branch) 
[0143] Condition branch, which is a typical instruction of 
the computer having the above construct ion, and an indirect 
operand will now be described. 
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[0144] Fig. 17 illustrates a condition branch from among 
the functions of this surface computer. Under the control of 
the IFDTC 5, the IFU 2 executes the condition branch based on 
the value of the register. In the example shown in Fig. 17, 
when a Move instruction causing a point PTs of the surface 
region data 161 in the DRAM 1 to be moved to a register #0 in 
the IFDTC 5 is executed, if the value of the register #0 is 
equal to 0, branching to the Label 1 in the main memory 7 is 
executed. Thus, the surface computer according to the present 
invention can switch processing in accordance with the result 
of a computation. That is, the surface computer according to 
the present invention has a function which enables the 
instruction fetch of the IFDTC 5 to condition branch in 
accordance with the value of the register. 

[0145] This processing is performed in the direction of an 
arrow (1) in Fig. 1. Provision of the condition branching 
processing facilitates the surface computer programming. In 
addition, this condition branch processing is concurrent 
processing which takes over a "jump" instruction in the 
conventional sequential processing. 
( Indirect Operand) 

[0146] Fig. 18 shows the indirect operand of this surface 

computer. The indirect operand serves to have the source 
operand read and establish the value of the source operand as 
an executing address for next access. In the indirect operand 
of the surface computer according to the present invention, 
the coordinates of the operand specifying the surface region 
data SD are represented as data in the surface region data SD. 
The operand region is obtained. 

34 



[0147] In Fig. 18, an indirect operand *TRMs specified by a 

source operand TRMs and a destination operand TRMd are added 
and then the resultant data is set to TRMd. The PA 4 and the 
PP 9 handle this indirect operand as shown in the direction of 
an arrow (2) in Fig. 1. In this surface computer, even though 
the quantity of data processed at one time is large, the 
indirect operand mechanism is effective, because the overhead 
of processing becomes less due to this mechanism. 
[0148] In this embodiment, coordinate data is 32 bit based 

and may be treated as 32 bit (fixed point, floating point) , 16 
bit x 2 (fixed point) , 10 bit x 3 (fixed point) , or 8 bit x 4 
(fixed point) data. When the coordinates of the surface 
region operand are represented by the surface region data SD, 
two-dimensional coordinates are represented using 16 bit x 2 
(fixed point) data while three-dimensional coordinates for 
tri-linear interpolation are represented using 10 bit x 3 
(fixed point). When coordinates are represented in 32 bits, 
two operands are required for representing two-dimensional 
coordinates and three operands are required for representing 
three-dimensional coordinates . 

[Concrete Example Computation of Surface Computer] 
(Instruction Set) 

[0149] In Fig. 11, a case in which an add instruction is 
executed is illustrated. In this example, the surface region 
data SPs (source operand) and the surface region data SPd 

(destination operand) are added by the surface region unit. 
That is, the surface region data SPs and SPd are added along 
with computation data which belong to the mutually matched 
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regions. This processing enables considerably faster 

computation to be performed. An instruction set for this 
surface computer may include conventional instructions such as 
ADD, SUB, MADD, and MOVE. 
( Comput ing Cont ro 1 ) 

[0150] Cases in which computation control is applied to an 
instruction having two operands, an instruction having three 
operands, and an instruction having four operands are shown in 
Figs. 19A to 19D, respectively. First, Fig. 19A shows 
computation control of two operands. In Fig. 19A, an ADD or 
SUB operation is applied to the source surface region data SPs 
and the destination surface region SPd- 
ADD: SPd = SPd + SPS 
SUB: SPd = SPd - SPs 
The resultant data are stored in the address specified by the 
destination surface. The example shown in Fig. 19A is 
equivalent to a case in which the products of the elements of 
the first row of a coefficient matrix (a 00 , a oi' a 02' a 03^ and 
the corresponding elements of a column vector (x, y, z, w) 

(that is, a 0 Q * x * a oi * Y' a 02 * z ' a 03 * w ^ are computed and 
then the total of the resultant products are computed. 

[0151] Figs. 19B and 19C show computing control of three 

operands. In Figs. 19B and 19C, each of MADD (multiply and 
add) and CMOV (condition move) is applied to computation data 
in the surface region specified by operands SPd, SPs, and SPt. 
In Fig. 19B, the following computation is performed. 

MADD: SPd = SPd + SPs x SPt 
SPs x SPt is computed, the product is added to Spd, and then 
the result is stored at the address specified by SPd. 
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[0152] In Fig. 19C, SPs and SPt are compared; if a 

condition (SPs > SPt) is satisfied, SPs is transferred to SPd. 

CMOV, GT: if (SPs > SPt) SPd = SPs 
[0153] Fig. 19D shows computing control of four operands. 
In this example, the MADD and CMOV instructions are applied to 
each of operands SPd, SPs, SPt, and SPc. Here, SPs and SPt 
are multiplied, SPd is added to the resultant product, and the 
result and SPc are compared. If a given condition is 
satisfied, the value obtained by adding the above product to 
SPd is set in SPd. The following expression is obtained. 

MADD. CMOV. GT: if (SPd + SPs x SPt > SPc) 

SPt = SPd + SPs x SPt; else SPd = SPc 
[0154] Thus, in the surface computer, even though multiple 

operands are used, computation data is transferred and 
computed at the same time along with this surface region. 
Accordingly, computation efficiency is not lessened. As 
described above, the surface computer according to the present 
invention can be constructed so as to have multiple operands, 
which is appropriate for a multiple pipeline. 

[0155] Fig. 20 shows processing in which a so-called "if- 
statement" in programming language C is executed. As shown in 
the description of the SIMD-type computer, it leads to 
complexity in programming in which the same processing is 
executed by a single instruction. Accordingly, by providing a 
control surface operand, processing is controlled. In Fig. 20, 
condition move of three operands SPd, SPs, and SPt is executed 
using condition -move instructions CMOV and GT. CMOV 
represents condition move and GT represents greater 
(comparison) . When the control surface region data SPt and 



the source surface region data SPs are compared, if a 
condition SPs > SPt is satisfied, data is moved to the 
destination operand SPd. Thus, the surface computer according 
to the present invention can switch processing in accordance 
with the result of computations. As described above, this 
condition-branch processing can be concurrent processing which 
takes over the conventional "jump" instruction. 

(Condition-branch Instruction) 
[0156] Fig. 21 shows an example in which the control 
surface data itself controls an operator. The condition- 
branch instruction shown in the example is a surface effective 
instruction EXDT having three operands SPd, SPs, and SPt. 
Two-operand instruction specified by the control surface data 
SPt is executed. Thus, when the surface computer performs 
concurrent processing, condition-branch processing can be 
performed using control surface data. This example is 
equivalent to the example shown in Fig. 17 if the control 
surface data is used as the condition for indicating that a 
register of the IFDTC 5 is equal to zero and if the 
destination address of the resultant data is Label 1 in the 
main memory 7 . 

[0157] In the embodiment, the surface computer according to 

the present invention is used for large-scale computations in 
various fields of natural science. This surface computer may 
be used for various simulations and drawings according to 
three-dimensional animation, which leads to improvement in the 
speed of processing, usability, and satisfaction of users. 
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